-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Add note about suspending AZRebalance #1802
Conversation
According to some user reports [1], you can actually run cluster-autoscaler against an ASG that spans multiple AZs, you just have to suspend the AZRebalance scaling process to avoid unexpected node termination. [1] https://kubernetes.slack.com/archives/C8SH2GSL9/p1552600210276600?thread_ts=1552420686.257000&cid=C8SH2GSL9
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/joke |
@mgalgs: Why was the broom late for the meeting? He overswept. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Rebalance is not the only reason why CA doesn't support multi-AZ nodegroups. The core logic of CA works by taking a random existing node and assuming any new node in the same ASG will look exactly the same. In multi-AZ ASG the new node can be in a different zone than CA assumes it will be, which can lead to incorrect autoscaling decisions (unnecessary scale-up/no scale-up). This commonly leads to issues, especially when using PVs in multi-AZ clusters. A recent example: kubernetes/kubernetes#75402. It may work ok-ish with multi-AZ ASG if you disable rebalancing, don't use storage, don't use podAffinity with topology other than host, don't use nodeAffinity on zone label, never scale any zone to 0, ... |
Got it, thanks for the explanation! |
According to some user reports [1], you can actually run
cluster-autoscaler against an ASG that spans multiple AZs,
you just have to suspend the AZRebalance scaling process
to avoid unexpected node termination.
[1] https://kubernetes.slack.com/archives/C8SH2GSL9/p1552600210276600?thread_ts=1552420686.257000&cid=C8SH2GSL9
NOTE: I've only been running in this configuration for about 1 day, so I can't personally vouch for the correctness of this workaround. As mentioned in the commit text above, other users have reported running in this configuration without issues. Would be great to get confirmation from an expert though ;)